AITopics | safety protocol

Collaborating Authors

safety protocol

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimizing AI Agent Attacks With Synthetic Data

Loughridge, Chloe, Colognese, Paul, Griffin, Avery, Tracy, Tyler, Kutasov, Jon, Benton, Joe

arXiv.org Artificial IntelligenceNov-5-2025

As AI deployments become more complex and high-stakes, it becomes increasingly important to be able to estimate their risk. AI control is one framework for doing so. However, good control evaluations require eliciting strong attack policies. This can be challenging in complex agentic environments where compute constraints leave us data-poor. In this work, we show how to optimize attack policies in SHADE-Arena, a dataset of diverse realistic control environments. We do this by decomposing attack capability into five constituent skills -- suspicion modeling, attack selection, plan synthesis, execution and subtlety -- and optimizing each component individually. To get around the constraint of limited data, we develop a probabilistic model of attack dynamics, optimize our attack hyperparameters using this simulation, and then show that the results transfer to SHADE-Arena. This results in a substantial improvement in attack strength, reducing safety score from a baseline of 0.87 to 0.41 using our scaffold.

attack policy, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.02823

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.75)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

`For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts

Schoene, Annika M, Canca, Cansu

arXiv.org Artificial IntelligenceJul-8-2025

Recent advances in large language models (LLMs) have led to increasingly sophisticated safety protocols and features designed to prevent harmful, unethical, or unauthorized outputs. However, these guardrails remain susceptible to novel and creative forms of adversarial prompting, including manually generated test cases. In this work, we present two new test cases in mental health for (i) suicide and (ii) self-harm, using multi-step, prompt-level jailbreaking and bypass built-in content and safety filters. We show that user intent is disregarded, leading to the generation of detailed harmful content and instructions that could cause real-world harm. We conduct an empirical evaluation across six widely available LLMs, demonstrating the generalizability and reliability of the bypass. We assess these findings and the multilayered ethical tensions that they present for their implications on prompt-response filtering and context- and task-specific model development. We recommend a more comprehensive and systematic approach to AI safety and ethics while emphasizing the need for continuous adversarial testing in safety-critical AI deployments. We also argue that while certain clearly defined safety measures and guardrails can and must be implemented in LLMs, ensuring robust and comprehensive safety across all use cases and domains remains extremely challenging given the current technical maturity of general-purpose LLMs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.0299

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Humanoid robot malfunctions, sparks viral panic

FOX NewsJun-2-2025, 10:00:15 GMT

Kurt Knutsson talks about a viral video that shows a humanoid robot going wild. A chilling video circulating on social media has reignited old anxieties about robots turning against their creators. The footage shows a Unitree H1 humanoid robot, a machine about the size of an adult human, suddenly flailing its arms and legs with alarming force during a test, coming dangerously close to two technicians. The scene has sparked heated debate about the safety of advanced robotics and artificial intelligence. But is this truly the beginning of something out of our worst fears, or is there just a straightforward technical explanation for what happened?

cyberguy, humanoid robot, robot, (13 more...)

FOX News

Industry: Media > News (0.53)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.92)

Add feedback

Jailbreaking is (Mostly) Simpler Than You Think

Russinovich, Mark, Salem, Ahmed

arXiv.org Artificial IntelligenceMar-7-2025

The rapid advancement of artificial intelligence has coincided with increasing concerns regarding the safe and ethical deployment of these systems. As AI models become more capable, ensuring that their behavior aligns with societal norms and safety standards has emerged as a critical research challenge. State-of-the-art alignment techniques--such as reinforcement learning from human feedback and rulebased fine-tuning--strive to constrain models to acceptable ethical behaviors. However, these methods face an inherent tension: while alignment is designed to prevent the disclosure of harmful or sensitive information, adversaries can leverage the gap between a model's potential and its restricted behavior through what is known as a jailbreak. In the context of AI, a jailbreak is any method that circumvents established safety protocols, effectively enabling functionalities that the system would otherwise suppress. Current jailbreaks typically deploy elaborate prompt constructions or optimization strategies; in contrast, in this paper we present the Context Compliance Attack (CCA), a simple optimization-free jailbreak. CCA leverages a basic yet critical design flaw--the reliance on client-supplied conversation history--to subvert the AI systems' safeguards and jailbreak them. This paper investigates the efficacy of CCA and explores its implications on current AI safety architectures.

ai system, conversation history, language model, (14 more...)

arXiv.org Artificial Intelligence

2503.05264

Genre:

Research Report (1.00)
Overview (0.89)

Industry:

Law (0.68)
Information Technology > Security & Privacy (0.50)
Law Enforcement & Public Safety > Terrorism (0.35)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Coal Mining Question Answering with LLMs

Rivera, Antonio Carlos, Moore, Anthony, Robinson, Steven

arXiv.org Artificial IntelligenceOct-3-2024

In this paper, we present a novel approach to coal mining question answering (QA) using large language models (LLMs) combined with tailored prompt engineering techniques. Coal mining is a complex, high-risk industry where accurate, context-aware information is critical for safe and efficient operations. Current QA systems struggle to handle the technical and dynamic nature of mining-related queries. To address these challenges, we propose a multi-turn prompt engineering framework designed to guide LLMs, such as GPT-4, in answering coal mining questions with higher precision and relevance. By breaking down complex queries into structured components, our approach allows LLMs to process nuanced technical information more effectively. We manually curated a dataset of 500 questions from real-world mining scenarios and evaluated the system's performance using both accuracy (ACC) and GPT-4-based scoring metrics. Experiments comparing ChatGPT, Claude2, and GPT-4 across baseline, chain-of-thought (CoT), and multi-turn prompting methods demonstrate that our method significantly improves both accuracy and contextual relevance, with an average accuracy improvement of 15-18\% and a notable increase in GPT-4 scores. The results show that our prompt-engineering approach provides a robust, adaptable solution for domain-specific question answering in high-stakes environments like coal mining.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.02959

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Puerto Rico (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry: Materials > Metals & Mining > Coal (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenAI CEO Sam Altman says Elon Musk-backed letter calling for AI pause wasn't 'optimal way to address it'

FOX NewsApr-15-2023, 02:32:50 GMT

Twitter and Tesla CEO Elon Musk weighs in on the dangers of artificial intelligence, the future of Twitter and more in an exclusive'Tucker Carlson Tonight' interview. OpenAI CEO Sam Altman says that a letter signed by Twitter CEO Elon Musk and others in the technology community calling for a pause on "giant AI experiments" wasn't the right way to address the issue. Musk, Steve Wozniak, and other tech leaders signed the letter in March, which asked AI developers to "immediately pause for at least 6 months the training of AI systems more powerful than GPT-4." During a virtual appearance at the Massachusetts Institute of Technology on Thursday, Altman addressed the letter. "There's parts of the thrust that I really agree with," Altman said, adding that his team spent more than six months after completing the training of ChatGPT 4 to study safety components before it was released.

elon musk-backed letter, openai ceo sam altman, safety component, (6 more...)

FOX News

Country:

North America > United States > Massachusetts (0.26)
North America > United States > Washington > King County > Redmond (0.06)

Industry:

Media > News (0.61)
Information Technology (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Add feedback

Elon Musk's Urgent Warning, Demands Pause on AI Research

#artificialintelligenceApr-1-2023, 00:45:22 GMT

Elon Musk, the owner of Tesla Motors, has joined forces with several other industry experts to pen an open letter calling for a pause on the further development of AI tools like OpenAI's newly launched GPT-4. The letter cites potential "risks to society and humanity" as the primary reason for this request. By uniting prominent figures like Elon Musk and leading experts in the AI field, the letter emphasizes the need for responsible development and collaboration among industry stakeholders. The open letter, signed by Musk and other industry heavyweights, stresses the importance of ensuring that power systems are developed only once we have confidence in their positive effects and manageable risks. Tesla, which uses AI for its autopilot system, showcases Musk's own involvement in the field and the gravity of his concerns.

elon musk, safety protocol, urgent warning, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Elon Musk, Apple co-founder, other tech experts call for pause on 'giant AI experiments': 'Dangerous race'

FOX NewsMar-29-2023, 10:44:33 GMT

As more companies rush to implement AI solutions and software, a growing number of experts are warning that it could result in an explosion of'fake news' and misinformation. Elon Musk, Steve Wozniak, and a host of other tech leaders and artificial intelligence experts are urging AI labs to pause development of powerful new AI systems in an open letter citing potential risks to society. The letter asks AI developers to "immediately pause for at least 6 months the training of AI systems more powerful than GPT-4." It was issued by the Future of Life Institute and signed by more than 1,000 people, including Musk, who argued that safety protocols need to be developed by independent overseers to guide the future of AI systems. GPT-4 is the latest deep learning model from OpenAI, which "exhibits human-level performance on various professional and academic benchmarks," according to the lab.

ai system, elon musk, openai, (10 more...)

FOX News

Country:

North America > United States > New York (0.06)
North America > United States > California (0.06)

Industry: Media > News (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Could Your Robot be Spying on You?

#artificialintelligenceSep-7-2021, 18:53:21 GMT

Researcher led by the National University of Singapore recently demonstrated that household robot vacuum cleaners can be hacked to behave like listening devices, which spy on their unsuspecting owners. Could industrial robots be similarly compromised? Hackers have exploited LiDAR (Light Detection And Ranging) scanner technology, as used in the latest iPhone, to turn a household vacuum cleaner into a spying device. If that's not sinister enough, elsewhere, in an experimental stunt, a friendly-looking humanoid robot was hacked to act like Chucky, the evil killer doll from the Child's Play movies. A video shows the robot attacking a tomato while emitting an evil laugh. The latter experiment was designed to demonstrate the vulnerabilities of technologies that can be hacked in an increasingly-connected world.

industrial robot, robot, software, (13 more...)

#artificialintelligence

Country: Asia > Singapore (0.25)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

How Conversational AI is Changing the U.S Healthcare Industry in the Post Pandemic Era – 3 Best Practices you Should Know

#artificialintelligenceJul-18-2021, 14:30:04 GMT

Technology has changed, and so has our buying habits. Today, we find advancements that can lessen our day-to-day operations and save our time & efforts. However, in the race to compete for excellence, we neglect our health and demand healthcare services quicker than expected. Global pandemic COVID-19 is one of the biggest examples of how humans can get affected by a single virus. Those with strong immunity overcome the impact of COVID and hence survive.

best practice, conversational ai, healthcare industry, (10 more...)

#artificialintelligence

Country: North America > United States (0.19)

Industry:

Health & Medicine > Health Care Providers & Services (0.81)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.77)
Health & Medicine > Therapeutic Area > Immunology (0.77)
Health & Medicine > Epidemiology (0.77)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.31)

Add feedback